[mlir] Report line number from file rather than split #150982

jpienaar · 2025-07-28T16:13:19Z

Add convention for lexer if the last file is contained in the first, then the first is used for error reporting. This requires that these two overlap to make it easy to find the corresponding spots. Enables going from

within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block

to

mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block

This does change the split to not produce always null terminated buffers and tools that need it to do so themselves (which is mostly by copying - this may have no actual impact as previously this was a copy too). Alternatively I could copy both and then split buffers are always null-terminated, but that seems worse.

llvmbot · 2025-07-28T16:13:53Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-core

Author: Jacques Pienaar (jpienaar)

Changes

Add convention for lexer if the last file is contained in the first, then the first is used for error reporting. This requires that these two overlap to make it easy to find the corresponding spots. Enables going from

within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block

to

mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block

Along the way had to remove need for buffer to be null-terminated when parsing MLIR buffers. This does change the split to not produce always null terminated buffers and tools that need it to do so themselves (which is mostly by copying - this may have no actual impact as previously this was a copy too). Alternatively I could copy both and then split buffers are always null-terminated, but that seems worse.

Full diff: https://github.com/llvm/llvm-project/pull/150982.diff

12 Files Affected:

(modified) mlir/include/mlir/IR/Diagnostics.h (+4)
(modified) mlir/include/mlir/Support/ToolUtilities.h (+15)
(modified) mlir/lib/AsmParser/DialectSymbolParser.cpp (+7)
(modified) mlir/lib/AsmParser/Lexer.cpp (+24-3)
(modified) mlir/lib/AsmParser/Lexer.h (+3)
(modified) mlir/lib/IR/Diagnostics.cpp (+12-9)
(modified) mlir/lib/Support/ToolUtilities.cpp (+30-9)
(modified) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp (+38-17)
(modified) mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp (+5)
(added) mlir/test/IR/diagnostic-nosplit.mlir (+13)
(modified) mlir/test/IR/top-level.mlir (+2-2)
(modified) mlir/tools/mlir-pdll/mlir-pdll.cpp (+4)

diff --git a/mlir/include/mlir/IR/Diagnostics.h b/mlir/include/mlir/IR/Diagnostics.h
index 4ed0423902138..7ff718ad7f241 100644
--- a/mlir/include/mlir/IR/Diagnostics.h
+++ b/mlir/include/mlir/IR/Diagnostics.h
@@ -639,6 +639,10 @@ class SourceMgrDiagnosticVerifierHandler : public SourceMgrDiagnosticHandler {
   /// verified correctly, failure otherwise.
   LogicalResult verify();
 
+  /// Register this handler with the given context. This is intended for use
+  /// with the splitAndProcessBuffer function.
+  void registerInContext(MLIRContext *ctx);
+
 private:
   /// Process a single diagnostic.
   void process(Diagnostic &diag);
diff --git a/mlir/include/mlir/Support/ToolUtilities.h b/mlir/include/mlir/Support/ToolUtilities.h
index cb6ba299d28f1..1c8d1296b8c70 100644
--- a/mlir/include/mlir/Support/ToolUtilities.h
+++ b/mlir/include/mlir/Support/ToolUtilities.h
@@ -21,10 +21,16 @@
 
 namespace llvm {
 class MemoryBuffer;
+class MemoryBufferRef;
 } // namespace llvm
 
 namespace mlir {
+// A function that processes a chunk of a buffer and writes the result to an
+// output stream.
 using ChunkBufferHandler = function_ref<LogicalResult(
+    std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
+    const llvm::MemoryBufferRef &sourceBuffer, raw_ostream &os)>;
+using NoSourceChunkBufferHandler = function_ref<LogicalResult(
     std::unique_ptr<llvm::MemoryBuffer> chunkBuffer, raw_ostream &os)>;
 
 extern inline const char *const kDefaultSplitMarker = "// -----";
@@ -45,6 +51,15 @@ splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
                       ChunkBufferHandler processChunkBuffer, raw_ostream &os,
                       llvm::StringRef inputSplitMarker = kDefaultSplitMarker,
                       llvm::StringRef outputSplitMarker = "");
+
+/// Same as above, but for case where the original buffer is not used in any
+/// form.
+LogicalResult
+splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
+                      NoSourceChunkBufferHandler processChunkBuffer,
+                      raw_ostream &os,
+                      llvm::StringRef inputSplitMarker = kDefaultSplitMarker,
+                      llvm::StringRef outputSplitMarker = "");
 } // namespace mlir
 
 #endif // MLIR_SUPPORT_TOOLUTILITIES_H
diff --git a/mlir/lib/AsmParser/DialectSymbolParser.cpp b/mlir/lib/AsmParser/DialectSymbolParser.cpp
index 9f4a87a6a02de..984242798e2a2 100644
--- a/mlir/lib/AsmParser/DialectSymbolParser.cpp
+++ b/mlir/lib/AsmParser/DialectSymbolParser.cpp
@@ -89,6 +89,7 @@ ParseResult Parser::parseDialectSymbolBody(StringRef &body,
     nestedPunctuation.pop_back();
     return success();
   };
+  const char* curBufferEnd = state.lex.getBufferEnd();
   do {
     // Handle code completions, which may appear in the middle of the symbol
     // body.
@@ -98,6 +99,12 @@ ParseResult Parser::parseDialectSymbolBody(StringRef &body,
       break;
     }
 
+    if (curBufferEnd == curPtr) {
+      if (!nestedPunctuation.empty())
+        return emitPunctError();
+      return emitError("unexpected nul or EOF in pretty dialect name");
+    }
+
     char c = *curPtr++;
     switch (c) {
     case '\0':
diff --git a/mlir/lib/AsmParser/Lexer.cpp b/mlir/lib/AsmParser/Lexer.cpp
index 751bd63e537f8..8f53529823e23 100644
--- a/mlir/lib/AsmParser/Lexer.cpp
+++ b/mlir/lib/AsmParser/Lexer.cpp
@@ -37,6 +37,18 @@ Lexer::Lexer(const llvm::SourceMgr &sourceMgr, MLIRContext *context,
              AsmParserCodeCompleteContext *codeCompleteContext)
     : sourceMgr(sourceMgr), context(context), codeCompleteLoc(nullptr) {
   auto bufferID = sourceMgr.getMainFileID();
+
+  // Check to see if the main buffer contains the last buffer, and if so the
+  // last buffer should be used as main file for parsing.
+  if (sourceMgr.getNumBuffers() > 1) {
+    unsigned lastFileID = sourceMgr.getNumBuffers();
+    const llvm::MemoryBuffer *main = sourceMgr.getMemoryBuffer(bufferID);
+    const llvm::MemoryBuffer *last = sourceMgr.getMemoryBuffer(lastFileID);
+    if (main->getBufferStart() <= last->getBufferStart() &&
+        main->getBufferEnd() >= last->getBufferEnd()) {
+      bufferID = lastFileID;
+    }
+  }
   curBuffer = sourceMgr.getMemoryBuffer(bufferID)->getBuffer();
   curPtr = curBuffer.begin();
 
@@ -71,6 +83,7 @@ Token Lexer::emitError(const char *loc, const Twine &message) {
 }
 
 Token Lexer::lexToken() {
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
     const char *tokStart = curPtr;
 
@@ -78,6 +91,9 @@ Token Lexer::lexToken() {
     if (tokStart == codeCompleteLoc)
       return formToken(Token::code_complete, tokStart);
 
+    if (tokStart == curBufferEnd)
+      return formToken(Token::eof, tokStart);
+
     // Lex the next token.
     switch (*curPtr++) {
     default:
@@ -102,7 +118,7 @@ Token Lexer::lexToken() {
     case 0:
       // This may either be a nul character in the source file or may be the EOF
       // marker that llvm::MemoryBuffer guarantees will be there.
-      if (curPtr - 1 == curBuffer.end())
+      if (curPtr - 1 == curBufferEnd)
         return formToken(Token::eof, tokStart);
       continue;
 
@@ -259,7 +275,11 @@ void Lexer::skipComment() {
   assert(*curPtr == '/');
   ++curPtr;
 
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
+    if (curPtr == curBufferEnd)
+      return;
+
     switch (*curPtr++) {
     case '\n':
     case '\r':
@@ -267,7 +287,7 @@ void Lexer::skipComment() {
       return;
     case 0:
       // If this is the end of the buffer, end the comment.
-      if (curPtr - 1 == curBuffer.end()) {
+      if (curPtr - 1 == curBufferEnd) {
         --curPtr;
         return;
       }
@@ -405,6 +425,7 @@ Token Lexer::lexPrefixedIdentifier(const char *tokStart) {
 Token Lexer::lexString(const char *tokStart) {
   assert(curPtr[-1] == '"');
 
+  const char *curBufferEnd = curBuffer.end();
   while (true) {
     // Check to see if there is a code completion location within the string. In
     // these cases we generate a completion location and place the currently
@@ -419,7 +440,7 @@ Token Lexer::lexString(const char *tokStart) {
     case 0:
       // If this is a random nul character in the middle of a string, just
       // include it.  If it is the end of file, then it is an error.
-      if (curPtr - 1 != curBuffer.end())
+      if (curPtr - 1 != curBufferEnd)
         continue;
       [[fallthrough]];
     case '\n':
diff --git a/mlir/lib/AsmParser/Lexer.h b/mlir/lib/AsmParser/Lexer.h
index 4085a9b73854b..670444eb1f5b4 100644
--- a/mlir/lib/AsmParser/Lexer.h
+++ b/mlir/lib/AsmParser/Lexer.h
@@ -40,6 +40,9 @@ class Lexer {
   /// Returns the start of the buffer.
   const char *getBufferBegin() { return curBuffer.data(); }
 
+  /// Returns the end of the buffer.
+  const char *getBufferEnd() { return curBuffer.end(); }
+
   /// Return the code completion location of the lexer, or nullptr if there is
   /// none.
   const char *getCodeCompleteLoc() const { return codeCompleteLoc; }
diff --git a/mlir/lib/IR/Diagnostics.cpp b/mlir/lib/IR/Diagnostics.cpp
index 3e337951bcd3f..776b5c6588c71 100644
--- a/mlir/lib/IR/Diagnostics.cpp
+++ b/mlir/lib/IR/Diagnostics.cpp
@@ -821,15 +821,7 @@ SourceMgrDiagnosticVerifierHandler::SourceMgrDiagnosticVerifierHandler(
   for (unsigned i = 0, e = mgr.getNumBuffers(); i != e; ++i)
     (void)impl->computeExpectedDiags(out, mgr, mgr.getMemoryBuffer(i + 1));
 
-  // Register a handler to verify the diagnostics.
-  setHandler([&](Diagnostic &diag) {
-    // Process the main diagnostics.
-    process(diag);
-
-    // Process each of the notes.
-    for (auto &note : diag.getNotes())
-      process(note);
-  });
+  registerInContext(ctx);
 }
 
 SourceMgrDiagnosticVerifierHandler::SourceMgrDiagnosticVerifierHandler(
@@ -862,6 +854,17 @@ LogicalResult SourceMgrDiagnosticVerifierHandler::verify() {
   return impl->status;
 }
 
+void SourceMgrDiagnosticVerifierHandler::registerInContext(MLIRContext *ctx) {
+  ctx->getDiagEngine().registerHandler([&](Diagnostic &diag) {
+    // Process the main diagnostics.
+    process(diag);
+
+    // Process each of the notes.
+    for (auto &note : diag.getNotes())
+      process(note);
+  });
+}
+
 /// Process a single diagnostic.
 void SourceMgrDiagnosticVerifierHandler::process(Diagnostic &diag) {
   return process(diag.getLocation(), diag.str(), diag.getSeverity());
diff --git a/mlir/lib/Support/ToolUtilities.cpp b/mlir/lib/Support/ToolUtilities.cpp
index 748f92847ac58..2cf33eb30d06a 100644
--- a/mlir/lib/Support/ToolUtilities.cpp
+++ b/mlir/lib/Support/ToolUtilities.cpp
@@ -14,6 +14,8 @@
 #include "mlir/Support/LLVM.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
+#include <string>
+#include <utility>
 
 using namespace mlir;
 
@@ -22,18 +24,18 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
                             ChunkBufferHandler processChunkBuffer,
                             raw_ostream &os, llvm::StringRef inputSplitMarker,
                             llvm::StringRef outputSplitMarker) {
+  llvm::MemoryBufferRef originalBufferRef = originalBuffer->getMemBufferRef();
   // If splitting is disabled, we process the full input buffer.
   if (inputSplitMarker.empty())
-    return processChunkBuffer(std::move(originalBuffer), os);
+    return processChunkBuffer(std::move(originalBuffer), originalBufferRef, os);
 
   const int inputSplitMarkerLen = inputSplitMarker.size();
 
-  auto *origMemBuffer = originalBuffer.get();
   SmallVector<StringRef, 8> rawSourceBuffers;
   const int checkLen = 2;
   // Split dropping the last checkLen chars to enable flagging near misses.
-  origMemBuffer->getBuffer().split(rawSourceBuffers,
-                                   inputSplitMarker.drop_back(checkLen));
+  originalBufferRef.getBuffer().split(rawSourceBuffers,
+                                      inputSplitMarker.drop_back(checkLen));
   if (rawSourceBuffers.empty())
     return success();
 
@@ -79,11 +81,17 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
   auto interleaveFn = [&](StringRef subBuffer) {
     auto splitLoc = SMLoc::getFromPointer(subBuffer.data());
     unsigned splitLine = fileSourceMgr.getLineAndColumn(splitLoc).first;
-    auto subMemBuffer = llvm::MemoryBuffer::getMemBufferCopy(
-        subBuffer, Twine("within split at ") +
-                       origMemBuffer->getBufferIdentifier() + ":" +
-                       Twine(splitLine) + " offset ");
-    if (failed(processChunkBuffer(std::move(subMemBuffer), os)))
+    std::string name((Twine("within split at ") +
+                      originalBufferRef.getBufferIdentifier() + ":" +
+                      Twine(splitLine) + " offset ")
+                         .str());
+    // Use MemoryBufferRef to avoid copying the buffer & keep at same location
+    // relative to the original buffer.
+    auto subMemBuffer =
+        llvm::MemoryBuffer::getMemBuffer(llvm::MemoryBufferRef(subBuffer, name),
+                                         /*RequiresNullTerminator=*/false);
+    if (failed(
+            processChunkBuffer(std::move(subMemBuffer), originalBufferRef, os)))
       hadFailure = true;
   };
   llvm::interleave(sourceBuffers, os, interleaveFn,
@@ -92,3 +100,16 @@ mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
   // If any fails, then return a failure of the tool.
   return failure(hadFailure);
 }
+
+LogicalResult
+mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer> originalBuffer,
+                            NoSourceChunkBufferHandler processChunkBuffer,
+                            raw_ostream &os, llvm::StringRef inputSplitMarker,
+                            llvm::StringRef outputSplitMarker) {
+  auto process = [&](std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
+                     const llvm::MemoryBufferRef &, raw_ostream &os) {
+    return processChunkBuffer(std::move(chunkBuffer), os);
+  };
+  return splitAndProcessBuffer(std::move(originalBuffer), process, os,
+                               inputSplitMarker, outputSplitMarker);
+}
diff --git a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
index 31e0caa768113..fe95b1b969ddb 100644
--- a/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
+++ b/mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
@@ -515,13 +515,20 @@ performActions(raw_ostream &os,
 
 /// Parses the memory buffer.  If successfully, run a series of passes against
 /// it and print the result.
-static LogicalResult processBuffer(raw_ostream &os,
-                                   std::unique_ptr<MemoryBuffer> ownedBuffer,
-                                   const MlirOptMainConfig &config,
-                                   DialectRegistry &registry,
-                                   llvm::ThreadPoolInterface *threadPool) {
+static LogicalResult
+processBuffer(raw_ostream &os, std::unique_ptr<MemoryBuffer> ownedBuffer,
+              llvm::MemoryBufferRef sourceBuffer,
+              const MlirOptMainConfig &config, DialectRegistry &registry,
+              SourceMgrDiagnosticVerifierHandler *verifyHandler,
+              llvm::ThreadPoolInterface *threadPool) {
   // Tell sourceMgr about this buffer, which is what the parser will pick up.
   auto sourceMgr = std::make_shared<SourceMgr>();
+  // Add the original buffer to the source manager to use for determining
+  // locations.
+  sourceMgr->AddNewSourceBuffer(
+      llvm::MemoryBuffer::getMemBuffer(sourceBuffer,
+                                       /*RequiresNullTerminator=*/false),
+      SMLoc());
   sourceMgr->AddNewSourceBuffer(std::move(ownedBuffer), SMLoc());
 
   // Create a context just for the current buffer. Disable threading on creation
@@ -529,6 +536,8 @@ static LogicalResult processBuffer(raw_ostream &os,
   MLIRContext context(registry, MLIRContext::Threading::DISABLED);
   if (threadPool)
     context.setThreadPool(*threadPool);
+  if (verifyHandler)
+    verifyHandler->registerInContext(&context);
 
   StringRef irdlFile = config.getIrdlFile();
   if (!irdlFile.empty() && failed(loadIRDLDialects(irdlFile, context)))
@@ -552,17 +561,12 @@ static LogicalResult processBuffer(raw_ostream &os,
     return performActions(os, sourceMgr, &context, config);
   }
 
-  SourceMgrDiagnosticVerifierHandler sourceMgrHandler(
-      *sourceMgr, &context, config.verifyDiagnosticsLevel());
-
   // Do any processing requested by command line flags.  We don't care whether
   // these actions succeed or fail, we only care what diagnostics they produce
   // and whether they match our expectations.
   (void)performActions(os, sourceMgr, &context, config);
 
-  // Verify the diagnostic handler to make sure that each of the diagnostics
-  // matched.
-  return sourceMgrHandler.verify();
+  return success();
 }
 
 std::pair<std::string, std::string>
@@ -631,14 +635,31 @@ LogicalResult mlir::MlirOptMain(llvm::raw_ostream &outputStream,
   if (threadPoolCtx.isMultithreadingEnabled())
     threadPool = &threadPoolCtx.getThreadPool();
 
+  SourceMgr sourceMgr;
+  sourceMgr.AddNewSourceBuffer(
+      llvm::MemoryBuffer::getMemBuffer(buffer->getMemBufferRef(),
+                                       /*RequiresNullTerminator=*/false),
+      SMLoc());
+  // Note: this creates a verifier handler independent of the the flag set, as
+  // internally if the flag is not set, a new scoped diagnostic handler is
+  // created which would intercept the diagnostics and verify them.
+  SourceMgrDiagnosticVerifierHandler sourceMgrHandler(
+      sourceMgr, &threadPoolCtx, config.verifyDiagnosticsLevel());
   auto chunkFn = [&](std::unique_ptr<MemoryBuffer> chunkBuffer,
-                     raw_ostream &os) {
-    return processBuffer(os, std::move(chunkBuffer), config, registry,
-                         threadPool);
+                     llvm::MemoryBufferRef sourceBuffer, raw_ostream &os) {
+    return processBuffer(
+        os, std::move(chunkBuffer), sourceBuffer, config, registry,
+        config.shouldVerifyDiagnostics() ? &sourceMgrHandler : nullptr,
+        threadPool);
   };
-  return splitAndProcessBuffer(std::move(buffer), chunkFn, outputStream,
-                               config.inputSplitMarker(),
-                               config.outputSplitMarker());
+  LogicalResult status = splitAndProcessBuffer(
+      llvm::MemoryBuffer::getMemBuffer(buffer->getMemBufferRef(),
+                                       /*RequiresNullTerminator=*/false),
+      chunkFn, outputStream, config.inputSplitMarker(),
+      config.outputSplitMarker());
+  if (config.shouldVerifyDiagnostics() && failed(sourceMgrHandler.verify()))
+    status = failure();
+  return status;
 }
 
 LogicalResult mlir::MlirOptMain(int argc, char **argv,
diff --git a/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp b/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
index f2a81cca2abe5..7e9d6940c8afd 100644
--- a/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
+++ b/mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp
@@ -138,6 +138,11 @@ LogicalResult mlir::mlirTranslateMain(int argc, char **argv,
   // Processes the memory buffer with a new MLIRContext.
   auto processBuffer = [&](std::unique_ptr<llvm::MemoryBuffer> ownedBuffer,
                            raw_ostream &os) {
+    // Many of the translations expect a null-terminated buffer while splitting
+    // the buffer does not guarantee null-termination. Make a copy of the buffer
+    // to ensure null-termination.
+    ownedBuffer = llvm::MemoryBuffer::getMemBufferCopy(
+        ownedBuffer->getBuffer(), ownedBuffer->getBufferIdentifier());
     // Temporary buffers for chained translation processing.
     std::string dataIn;
     std::string dataOut;
diff --git a/mlir/test/IR/diagnostic-nosplit.mlir b/mlir/test/IR/diagnostic-nosplit.mlir
new file mode 100644
index 0000000000000..ecfb9c62469b5
--- /dev/null
+++ b/mlir/test/IR/diagnostic-nosplit.mlir
@@ -0,0 +1,13 @@
+// RUN: not mlir-opt %s -o - --split-input-file 2>&1 | FileCheck %s
+// This test verifies that diagnostic handler doesn't emit splits.
+
+
+// -----
+
+
+
+func.func @constant_out_of_range() {
+  // CHECK: mlir:11:8: error: 'arith.constant'
+  %x = "arith.constant"() {value = 100} : () -> i1
+  return
+}
diff --git a/mlir/test/IR/top-level.mlir b/mlir/test/IR/top-level.mlir
index b571d944928c8..e0adb4d823344 100644
--- a/mlir/test/IR/top-level.mlir
+++ b/mlir/test/IR/top-level.mlir
@@ -6,10 +6,10 @@ func.func private @foo()
 
 // -----
 
-// expected-error@-3 {{source must contain a single top-level operation, found: 2}}
+// expected-error@-9 {{source must contain a single top-level operation, found: 2}}
 func.func private @bar()
 func.func private @baz()
 
 // -----
 
-// expected-error@-3 {{source must contain a single top-level operation, found: 0}}
+// expected-error@-15 {{source must contain a single top-level operation, found: 0}}
diff --git a/mlir/tools/mlir-pdll/mlir-pdll.cpp b/mlir/tools/mlir-pdll/mlir-pdll.cpp
index 88a5f3639c962..b73dc85ee2d22 100644
--- a/mlir/tools/mlir-pdll/mlir-pdll.cpp
+++ b/mlir/tools/mlir-pdll/mlir-pdll.cpp
@@ -201,6 +201,10 @@ int main(int argc, char **argv) {
   llvm::raw_string_ostream outputStrOS(outputStr);
   auto processFn = [&](std::unique_ptr<llvm::MemoryBuffer> chunkBuffer,
                        raw_ostream &os) {
+    // Split does not guarantee null-termination. Make a copy of the buffer to
+    // ensure null-termination.
+    chunkBuffer = llvm::MemoryBuffer::getMemBufferCopy(
+        chunkBuffer->getBuffer(), chunkBuffer->getBufferIdentifier());
     return processBuffer(os, std::move(chunkBuffer), outputType, includeDirs,
                          dumpODS, includedFiles);
   };

github-actions · 2025-07-28T16:15:40Z

✅ With the latest revision this PR passed the C/C++ code formatter.

joker-eph

LG overall!

Can you split the lexer fixes out in a separate commit?

mlir/include/mlir/Support/ToolUtilities.h

mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp

Add convention for lexer if the last file is contained in the first, then the first is used for error reporting. This requires that these two overlap to make it easy to find the corresponding spots. Enables going from ``` within split at mlir/test/IR/invalid.mlir::10 offset :6:9: error: reference to an undefined block ``` to ``` mlir/test/IR/invalid.mlir:15:9: error: reference to an undefined block ``` This does change the split to not produce always null terminated buffers and tools that need it to do so themselves (which is mostly by copying - this may have no actual impact as previously this was a copy too). Alternatively I could copy both and then split buffers are always null-terminated, but that seems worse.

hanhanW · 2025-07-29T18:25:35Z

Hi, this commit breaks our downstream project tests; I wonder if it is expected. Inline the snippet:

// RUN: iree-opt --split-input-file \
// RUN:   --pass-pipeline='builtin.module(iree-util-test-conversion{widen-integers}, cse, canonicalize)' \
// RUN:   --verify-diagnostics %s | FileCheck %s

// -----
// Must be rank-0 or rank-1.
// expected-error @-3 {{conversion to util failed}}
util.func @verify_invalid_rank_2(%buffer: memref<4x2xf32>, %idx: index) -> f32{
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
  util.return %0 : f32
}

// -----
// Must have an identity map.
// expected-error @-3 {{conversion to util failed}}
#map = affine_map<(d0)[s0] -> (d0 * s0)>
util.func @verify_invalid_non_identity_map(%buffer: memref<4xf32, #map>, %idx: index) -> f32 {
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
  util.return %0 : f32
}

If we delete all the expected-error check, the line numbers of conversion to util failed change; it can only be captured once.

Without the change, the error is

module {
}

// -----
within split at /tmp/z.mlir:5 offset :4:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
       ^
within split at /tmp/z.mlir:5 offset :0:0: error: unexpected error: conversion to util failed
// -----
within split at /tmp/z.mlir:12 offset :5:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
       ^
within split at /tmp/z.mlir:12 offset :0:0: error: unexpected error: conversion to util failed

With the change, the error is:

module {
}

// -----
/tmp/z.mlir:8:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
       ^
/tmp/z.mlir:0:0: error: unexpected error: conversion to util failed
// -----
/tmp/z.mlir:16:8: error: unexpected error: failed to legalize operation 'memref.load' that was explicitly marked illegal
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
       ^
/tmp/z.mlir:0:0: error: unexpected error: conversion to util failed

The reporting number of both conversion failure becomes 0:0. In this case, the tool is happy with capturing the failure once. I.e., I need to update the checks to

// RUN: iree-opt --split-input-file \
// RUN:   --pass-pipeline='builtin.module(iree-util-test-conversion{widen-integers}, cse, canonicalize)' \
// RUN:   --verify-diagnostics %s | FileCheck %s

// -----
// Must be rank-0 or rank-1.
// expected-error @-7 {{conversion to util failed}}
util.func @verify_invalid_rank_2(%buffer: memref<4x2xf32>, %idx: index) -> f32{
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx, %idx] : memref<4x2xf32>
  util.return %0 : f32
}

// -----
// Must have an identity map.
#map = affine_map<(d0)[s0] -> (d0 * s0)>
util.func @verify_invalid_non_identity_map(%buffer: memref<4xf32, #map>, %idx: index) -> f32 {
  // expected-error @below {{failed to legalize operation 'memref.load'}}
  %0 = memref.load %buffer[%idx] : memref<4xf32, #map>
  util.return %0 : f32
}

Is it the expected behavior after the change?

…50982)" This reverts commit b6a98b9.

Revert commits: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) - llvm/llvm-project@330a7e1: see #21522 --------- Signed-off-by: hanhanW <[email protected]>

…50982)" This reverts commit b6a98b9.

Carrying revert: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) The version numbers are bumped because of iree-org/llvm-project@c2c8644. The comment is dropped because don't have any recent mobile devices to test with today. --------- Signed-off-by: hanhanW <[email protected]>

jpienaar · 2025-07-31T07:54:09Z

Is it the expected behavior after the change?

Yes, so what was previously happening is it was (due to implicit top level Module) inserting a top level op at the point of the split that you were matching against (hence -2, -3). Now the errors are wrt to the file. So either 1) introduce a module op that the error would be reported on, 2) change offset to be to top of file and deduplicate. I think (1) is better.

Example of same case where I updated TF , tensorflow/tensorflow@1318781 , I think it easier to understand without having to think about there actually being an implicit module or split location.

There is a 3rd option come to think of it. One could also make the implicit location to be created at start of slice, that wouldn't be too hard (one would need to do some similar lookup in mlir::parseSourceFile in the source manager and compute a new location). That is another option.

jpienaar · 2025-07-31T11:30:29Z

Option 3 attempt #151499

hanhanW · 2025-07-31T17:58:13Z

Thanks for the suggestions! @jtuyls will take over our integration issue.

jpienaar · 2025-08-01T04:49:00Z

Sounds good, we landed option 3 so potentially no further action needed your side.

Revert commits: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) - llvm/llvm-project@330a7e1: see iree-org#21522 --------- Signed-off-by: hanhanW <[email protected]>

Carrying revert: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) The version numbers are bumped because of iree-org/llvm-project@c2c8644. The comment is dropped because don't have any recent mobile devices to test with today. --------- Signed-off-by: hanhanW <[email protected]>

Revert commits: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) - llvm/llvm-project@330a7e1: see iree-org#21522 --------- Signed-off-by: hanhanW <[email protected]> Signed-off-by: keshavvinayak01 <[email protected]>

Carrying revert: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) The version numbers are bumped because of iree-org/llvm-project@c2c8644. The comment is dropped because don't have any recent mobile devices to test with today. --------- Signed-off-by: hanhanW <[email protected]> Signed-off-by: keshavvinayak01 <[email protected]>

Revert commits: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) - llvm/llvm-project@330a7e1: see #21522 --------- Signed-off-by: hanhanW <[email protected]>

Carrying revert: - llvm/llvm-project@b6a98b9: It is not clear if we should only capture the conversion failure once or not. Asking a question here: llvm/llvm-project#150982 (comment) The version numbers are bumped because of iree-org/llvm-project@c2c8644. The comment is dropped because don't have any recent mobile devices to test with today. --------- Signed-off-by: hanhanW <[email protected]>

jpienaar requested review from River707 and joker-eph July 28, 2025 16:13

llvmbot added mlir:core MLIR Core Infrastructure mlir labels Jul 28, 2025

joker-eph reviewed Jul 28, 2025

View reviewed changes

mlir/include/mlir/Support/ToolUtilities.h Outdated Show resolved Hide resolved

mlir/lib/Tools/mlir-translate/MlirTranslateMain.cpp Outdated Show resolved Hide resolved

jpienaar force-pushed the nosplitdiag branch from 45d997f to c82dad9 Compare July 29, 2025 03:01

jpienaar force-pushed the nosplitdiag branch from c82dad9 to 49ba6a5 Compare July 29, 2025 03:04

jpienaar requested a review from joker-eph July 29, 2025 09:38

joker-eph approved these changes Jul 29, 2025

View reviewed changes

jpienaar merged commit b6a98b9 into llvm:main Jul 29, 2025
9 checks passed

hanhanW added a commit to iree-org/llvm-project that referenced this pull request Jul 29, 2025

Revert "[mlir] Report line number from file rather than split (llvm#1…

6edd7f7

…50982)" This reverts commit b6a98b9.

hanhanW mentioned this pull request Jul 29, 2025

Integrate LLVM at llvm/llvm-project@a28e7f1aad3e iree-org/iree#21520

Merged

hanhanW added a commit to iree-org/llvm-project that referenced this pull request Jul 30, 2025

Revert "[mlir] Report line number from file rather than split (llvm#1…

c8184ff

…50982)" This reverts commit b6a98b9.

hanhanW mentioned this pull request Jul 30, 2025

Integrate LLVM at llvm/llvm-project@8e9a0fc0f2e5 iree-org/iree#21533

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir] Report line number from file rather than split #150982

[mlir] Report line number from file rather than split #150982

Uh oh!

jpienaar commented Jul 28, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 28, 2025 •

edited

Loading

Uh oh!

joker-eph left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanhanW commented Jul 29, 2025

Uh oh!

jpienaar commented Jul 31, 2025 •

edited

Loading

Uh oh!

jpienaar commented Jul 31, 2025

Uh oh!

hanhanW commented Jul 31, 2025

Uh oh!

jpienaar commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[mlir] Report line number from file rather than split #150982

[mlir] Report line number from file rather than split #150982

Uh oh!

Conversation

jpienaar commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hanhanW commented Jul 29, 2025

Uh oh!

jpienaar commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpienaar commented Jul 31, 2025

Uh oh!

hanhanW commented Jul 31, 2025

Uh oh!

jpienaar commented Aug 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jpienaar commented Jul 28, 2025 •

edited

Loading

llvmbot commented Jul 28, 2025 •

edited

Loading

github-actions bot commented Jul 28, 2025 •

edited

Loading

jpienaar commented Jul 31, 2025 •

edited

Loading